Skip to content

feat: two-phase status polling via lightweight /status endpoint#446

Merged
xzrderek merged 6 commits intomainfrom
sandeep/poll-status
Apr 29, 2026
Merged

feat: two-phase status polling via lightweight /status endpoint#446
xzrderek merged 6 commits intomainfrom
sandeep/poll-status

Conversation

@SunnySoldier357
Copy link
Copy Markdown
Collaborator

@SunnySoldier357 SunnySoldier357 commented Apr 23, 2026

Summary

  • Adds get_status() to FireworksTracingAdapter that calls the new /status endpoint on the tracing gateway for fast rollout status lookups.
  • Replaces the polling loop in RemoteRolloutProcessor with a two-phase approach: poll /status for the status code (lightweight point-read), then make a single search_logs call to backfill message/details/extras.

This reduces the hot-path read load on the Logs table from ~1000 RPS to a single read per rollout completion. Depends on the tracing gateway PR that adds the /status endpoint and Status Spanner table ([mono PR](https://github.com/fw-ai/fireworks/pull/$(cd /Users/sandeepsingh/src/fireworks/mono && gh pr view --json number -q .number 2>/dev/null || echo "TBD"))).

Test plan

  • Verify get_status() returns status when available, None when not
  • Verify RemoteRolloutProcessor polls /status first, then fetches full logs once
  • Run end-to-end eval with remote rollout to confirm no regressions

Made with Cursor


Note

Medium Risk
Changes remote rollout completion polling logic and introduces a new tracing gateway call path, which could affect rollout completion detection and timeout behavior if the /status response shape/availability differs across deployments.

Overview
Adds FireworksTracingAdapter.get_status() to query a lightweight tracing gateway /status (with /v1/status fallback) for rollout status codes.

Updates RemoteRolloutProcessor to use a two-phase polling flow: repeatedly poll get_status() until the rollout leaves RUNNING, then perform a single async_search_logs() call to backfill status message/details and propagate extras into execution_metadata.extra, reducing repeated log-table reads during polling.

Reviewed by Cursor Bugbot for commit a17c9b8. Bugbot is set up for automated code reviews on this repo. Configure here.

@SunnySoldier357 SunnySoldier357 self-assigned this Apr 23, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d162501efc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread eval_protocol/pytest/remote_rollout_processor.py Outdated
Comment thread eval_protocol/pytest/remote_rollout_processor.py Outdated
Comment thread eval_protocol/pytest/remote_rollout_processor.py Outdated
Comment thread eval_protocol/pytest/remote_rollout_processor.py Outdated
Comment thread eval_protocol/adapters/fireworks_tracing.py Outdated
Use the lightweight status endpoint from RemoteRolloutProcessor via the shared aiohttp session and avoid the logs backfill after terminal status is observed.

Made-with: Cursor
Continue polling when the lightweight status endpoint returns RUNNING so the remote rollout processor only exits the poll loop for terminal statuses.

Made-with: Cursor
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8e0e0ef. Configure here.

Comment thread eval_protocol/pytest/remote_rollout_processor.py
Read rollout status extras from the top-level status response so RemoteRolloutProcessor preserves metadata that previously came from log entries.

Made-with: Cursor
@xzrderek xzrderek enabled auto-merge (squash) April 29, 2026 03:12
@xzrderek xzrderek disabled auto-merge April 29, 2026 03:12
@xzrderek xzrderek merged commit 86a52a4 into main Apr 29, 2026
17 of 18 checks passed
@xzrderek xzrderek deleted the sandeep/poll-status branch April 29, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants